Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning

نویسندگان

چکیده

End-to-end deep learning approach has greatly enhanced the performance of speech recognition systems. With techniques, overfitting stills main problem with a little data. Data augmentation is suitable solution for problem, which adopted to improve quantity training data and enhance robustness models. In this paper, we investigate method enhancing Arabic automatic (ASR) based on end-to-end learning. applied original corpus increasing by applying noise adaptation, pitch-shifting, speed transformation. An CNN-LSTM attention-based encoder-decoder are included in building acoustic model decoding phase. This considered as state-of-art learning, best our knowledge, there no prior research employed ASR addition, language built using RNN-LM LSTM-LM methods. The Standard Single Speaker Corpus (SASSC) without diacritics used an corpus. Experimental results show that improved word error rate (WER) when compared same augmentation. achieved average reduction WER 4.55%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Speech: Scaling up end-to-end speech recognition

We present a state-of-the-art speech recognition system developed using end-toend deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model backgro...

متن کامل

End-to-End Deep Neural Network for Automatic Speech Recognition

We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize conv...

متن کامل

End-to-End Deep Learning for Driver Distraction Recognition

In this paper, an end-to-end deep learning solution for driver distraction recognition is presented. In the proposed framework, the features from pre-trained convolutional neural networks VGG-19 are extracted. Despite the variation in illumination conditions, camera position, driver’s ethnicity, and genders in our dataset, our best fine-tuned model, VGG-19 has achieved the highest test accuracy...

متن کامل

Robust end-to-end deep audiovisual speech recognition

Speech is one of the most effective ways of communication among humans. Even though audio is the most common way of transmitting speech, very important information can be found in other modalities, such as vision. Vision is particularly useful when the acoustic signal is corrupted. Multi-modal speech recognition however has not yet found wide-spread use, mostly because the temporal alignment an...

متن کامل

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum

In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under ‘healthy’ condition. This motivates us to use perception aware spectrum as the input to an end-to-end learn...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International journal of intelligent computing and information sciences

سال: 2021

ISSN: ['1687-109X', '2535-1710']

DOI: https://doi.org/10.21608/ijicis.2021.73581.1086